Monday, 17 July 2017
Conflict Management and Partition Handling
In Infinispan 9.1.0.Final we have overhauled the behaviour and configuration of partition handling in distributed and replicated caches. Partition handling is no longer simply enabled/disabled, instead a partition strategy is configured. This allows for more fine-grained control of a cache’s behaviour when a split brain occurs. Furthermore, we have created the ConflictManager component so that conflicts on cache entries can be automatically resolved on-demand by users and/or automatically during partition merges .
Conflict Manager
During a cache’s lifecycle it is possible for inconsistencies to appear between replicas of a cache entry due to a variety of reasons (e.g replication failures, incorrect use of flags etc). The conflict manager is a tool that allows users to retrieve all stored replica values for a cache entry. In addition to allowing users to process a stream of cache entries whose stored replicas have conflicting values. Furthermore, by utilising implementations of the EntryMergePolicy interface it is possible for said conflicts to be resolved deterministically.
===
EntryMergePolicy
In the event of conflicts arising between one or more replicas of a given CacheEntry, it is necessary for a conflict resolution algorithm to be defined, therefore we provide the EntryMergePolicy interface. This interface consists of a single method, "merge", whose output is utilised as the "resolved" CacheEntry for a given key. A non-null return value is put to all replicas of the CacheEntry in question, whereas a null return value results in all replicas being removed from the cache.
The merge method takes two parameters: the "preferredEntry" and "otherEntries". In the context of a partition merge, the preferredEntry is the CacheEntry associated with the partition whose coordinator is conducting the merge (or if multiple entries exist in this partition, it’s the primary replica). However, in all other contexts, the preferredEntry is simply the primary replica. The second parameter, "otherEntries" is simply a list of all other entries associated with the key for which a conflict was detected.
Currently Infinispan provides the following implementations of EntryMergePolicy:
Policy | Description |
---|---|
MergePolicies.PREFERRED_ALWAYS | Always utilise the "preferredEntry". |
MergePolicies.PREFERRED_NON_NULL | Utilise the "preferredEntry" if it is non-null, otherwise utilise the first entry from "otherEntries". |
MergePolicies.REMOVE_ALL | Always remove a key from the cache when a conflict is detected. |
Application Usage
For conflict resolution during partition merges, once an EntryMergePolicy has been configured for the cache, no additional actions are required by the user. However, if an Infinispan user would like to utilise the ConflictManager explicitly in their application, it should be retrieved by passing an AdvancedCache instance to the ConflictManagerFactory.
Note, that depending on the number of entries in the cache, the getConflicts and resolveConflict methods are expensive operations, as they both depend on a spliterator which lazily loads cache entries on a per segment basis. Consequently, when operating in distributed mode, if many conflicts exist, it is possible for an OutOfMemoryException to occur on the node searching for conflicts.
Partition Handling Strategies
In 9.1.0.Final the partition handling enabled/disabled option has been deprecated and users must now configure an appropriate PartitionHandling strategy for their application. A partition handling strategy determines what operations can be performed on a cache when a split brain event has occurred. Ultimately, in terms of Brewer’s CAP theorem, the configured strategy determines whether the cache’s availability or consistency is sacrificed in the presence of partition(s). Below is a table of the provided strategies and their characteristics:
Strategy | Description | CAP |
---|---|---|
DENY_READ_WRITES | If the partition does not have all owners for a given segment, both reads and writes are denied for all keys in that segment. This is equivalent to setting partition handling to true in Infinispan 9.0. | Consistency |
ALLOW_READS | Allows reads for a given key if it exists in this partition, but only allows writes if this partition contains all owners of a segment. | Availability |
ALLOW_READ_WRITES | Allow entries on each partition to diverge, with conflicts resolved during merge. This is equivalent to setting partition handling to false in Infinispan 9.0. | Availability |
==
Conflict Resolution on Partition Merge
When utilising the ALLOW_READ_WRITES partition strategy it is possible for the values of cache entries to diverge between competing partitions. Therefore, when the two partitions merge, it is necessary for these conflicts to be resolved. Internally Infinispan utilises a cache’s ConflictManager to search for cache entry conflicts and then applies the configured EntryMergePolicy to automatically resolve said conflicts before rebalancing the cache. This conflict resolution is completely automatic and does not require any additional code or input from Infinispan users.
Note, that if you do not want conflicts to be resolved automatically during a partition merge, i.e. the behaviour before 9.1.x, you can set the merge-policy to null (or NONE in xml).
==
==
Configuration
===== Programmatic
==== XML
== Conclusion
Partition handling has been overhauled in Infinispan 9.1.0.Final to allow for increased control over a cache’s behaviour. We have introduced the ConflictManager which enables users to inspect and manage the consistency of their cache entries via custom and provided merge policies.
If you have any feedback on the partition handling changes, or would like to request some new features/optimisations, let us know via the forum, issue tracker or the #infinispan channel onhttp://webchat.freenode.net/?channels=%23infinispan[ Freenode].
Tags: partition handling
Tuesday, 21 October 2014
Infinispan 7.0.0.CR2 released!
===
===
Dear community, the second release candidate of Infinispan 7 is out!
As we approach final release, the main themes of this CR were bugfixes and enhancements, many related to Partition Handling
Also included:
-
Spring Cache Provider support for Spring 4.1 (thanks Sebastian Łaskawiec)
-
Infinispan caches can now be exposed as OSGI managed services (thanks Bilgin Ibryam for the contribution!)
-
Support for replicated caches on partition handling
-
Cache.size( ) method now returns count across entire cluster instead of local
For the complete list of changes, please consult the release notes
If you have any questions, ask it on our forums, mailing lists or directly on IRC. irc://irc.freenode.org/infinispan[ ]
Tags: spring partition handling osgi release candidate
Monday, 25 August 2014
Partitioned clusters tell no lies!
The problem
You are happily running a 10-node cluster. You want failover and speed and are using distributed mode with 2 copies of data for each key (numOwners=2). But disaster strikes: a switch in your network crashes and 5 of your nodes can’t reach the other 5 anymore ! Now there are two independent clusters, each containing 5 nodes, which we are smartly going to name P1 and P2. Both P1 and P2 continue to serve user requests (puts and gets) as usual.
This cluster split in two or more parts is called partitioning or split brain. And it’s bad for business, as in really bad ! Bob and Alice share a bank account stored in the cache. Bob updates his account on P1, then Alice reads it from P2: she sees a stale value of Bob’s account (or even no value for Bob’s account, depending on how the split looks like). This is a consistency issue, as there’s an inconsistent view of the data between the two partitions.
Our solution
In Infinispan 7.0.0.Beta1 we added support for reacting to split brains: if nodes leave, Infinispan acknowledges that data might have been lost and denies user access to such data. We won’t deny access to all the data, but just the data that might have been affected by the partitioning. Or, more formally: Infinispan sacrifices data availability in order to offer consistency (PC in Brewer’s CAP theorem). For now partition handling is disabled by default, however we do intend to make it the default in an upcoming release: running with partition handling off is like running with scissors: do it at your own risk and only if you (don’t) know what you’re doing.
How we do it
A partition is assumed to happen when numOwners or more nodes disappear at the same time. When this happen two (or more) partitions form which are not aware of each other. Each such partition does not start a rebalance, but enters in degraded mode:
-
request (read and writes) for entries that have all the copies on nodes within this partition are honored
-
requests for entries that are partially or totally owned by nodes that disappeared are rejected through an AvailabilityException
To exemplify, consider the initial cluster C0=\{A,B,C,D}, A,B,C,D - nodes, configured in distributed mode with numOwners=2. Further on, the cluster contains k1, k2 and k3 keys such that owners(k1) = \{A,B}, owners(k2) = \{B,C} and owners(k3) = \{C,D}. Then a partition happens C1=\{A,B} and C2=\{C,D}, the degraded mode exhibits the following behavior:
-
on C1, k1 is available for read/write, k2 (partially owned) and k3 (not owned) are not available and accessing them results in an AvailabilityException
-
on C2, k1 and k2 are not available for read/write, k3 is available
A relevant aspect of the partition handling process is the fact that when a split brain happens, the resulting partitions rely on the original consistent hash function (the one that existed before the split brain) in order to calculate key ownership. So it doesn’t matter if k1, k2 or k3 already exists in the cluster or not, as the availability is strictly determined by the consistent hash and not by the key existence.
If at a further point in time the initial partition C0 forms again as a result of the network healing and C1 and C2 partitions being merged back together, then C0 exists the degraded mode becoming fully available again.
Configuration for partition handling functionality
In order to enable partition handling within the XML configuration:
The same can be achieved programmatically:
The actual implementation is work in progress and Beta2 will contain further improvements which we will publish here!
Cheers, Mircea Markus
Tags: split brain partition handling availability